Parallel K - Means Algorithm on Distributed Memory Multiprocessors
نویسنده
چکیده
Clustering large data sets can be time consuming and processor intensive. This project is an implementation of the parallel version of a popular clustering algorithm, the k-means algorithm, to provide faster clustering solutions. This algorithm was tested such that 3,4,5,7 clusters were created on a cluster of Sun workstations. Optimal levels of speedup were not achieved; but the benefits of parallelization were observed. This methodology exploits the inherent dataparallelism in the k-means algorithm and makes use of the message-passing model.
منابع مشابه
Load Balancing for Extraplation Methods on Distributed Memory Multiprocessors
We presents a parallel algorithm for extrapolation methods on distributed memory multiprocessors combining diierent levels of par-allelism. A detailed analysis that uses appropriate primitives for communication shows that a sophisticated load balancing scheme is required to achieve a good speedup. We characterize an optimal load balancing based on Lagrange multipliers and investigate several si...
متن کاملParallel K-Means Algorithm for Shared Memory Multiprocessors
Clustering is the task of assigning a set of instances into groups in such a way that is dissimilarity of instances within each group is minimized. Clustering is widely used in several areas such as data mining, pattern recognition, machine learning, image processing, computer vision and etc. K-means is a popular clustering algorithm which partitions instances into a fixed number clusters in an...
متن کاملA Data-Clustering Algorithm on Distributed Memory Multiprocessors
To cluster increasingly massive data sets that are common today in data and text mining, we propose a parallel implementation of the k-means clustering algorithm based on the message passing model. The proposed algorithm exploits the inherent data-parallelism in the kmeans algorithm. We analytically show that the speedup and the scaleup of our algorithm approach the optimal as the number of dat...
متن کاملAutomatic Localization for Distributed-Memory Multiprocessors Using a Shared-Memory Compilation Framework
In this paper, we outline an approach for compiling for distributed-memory multiprocessors that is inherited from compiler technologies for shared-memory multiprocessors. We believe that this approach to compiling for distributed-memory machines is promising because it is a logical extension of the shared-memory parallel programming model, a model that is easier for programmers to work with, an...
متن کاملPerformance Characterization of Shared- and Distributed-Memory Multiprocessors on a Tree Search Problem
In this paper, we measure and compare the performance of sharedand distributed-memory multiprocessors using a parallel tree search problem to characterize these types of multiprocessors. We take the knapsack problem using the branch-and-bound algorithm as our workload. It is di cult to compare the performance using irregular parallel problems such as tree search problems because the parallelism...
متن کامل